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Abstract 

Best-response mechanisms (Nisan, Schapira, Valiant, Zohar, 2011) 
provide a unifying framework for studying various distributed protocols 
in which the participants are instructed to repeatedly best respond to 
each others' strategies. Two fundamental features of these mechanisms 
are convergence and incentive compatibility. 

This work investigates convergence and incentive compatibility condi- 
tions of such mechanisms when players are not guaranteed to always best 
respond but they rather play an imperfect best-response strategy. That is, 
at every time step every player deviates from the prescribed best-response 
strategy according to some probability parameter. The results explain to 
what extent convergence and incentive compatibility depend on the as- 
sumption that players never make mistakes, and how robust such protocols 
are to "noise" or "mistakes". 

1 Introduction 

In many distributed protocols the participants, termed players, have to play 
(can be seen as playing) some underlying base game over and over (or until 
some equilibrium point is reached). Hence, an appealing theoretical model for 
describing these protocols is provided by game dynamics. Nisan et al [NSVZ11] 
introduce a class of game dynamics, called best-response mechanisms, in which 
the players are instructed to always best-respond to what the other players are 
currently doing. They identify an interesting a class of games for which the 
resulting dynamics satisfies: 

• Convergence. The dynamics eventually reaches a unique equilibrium point 
(a unique pure Nash equilibrium) of the base game regardless of the order 
in which players respond and of the presence of concurrent responses. 

*Work partially supported by PRIN 2008 research project COGENT (Computational and 
GamE-theoretic aspects of uncoordinated NeTworks), funded by the Italian Ministry of Uni- 
versity and Research. 
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• Incentive compatibility. A player who deviates from the prescribed best- 
response strategy can only worsen his/her final utility that is, the dynam- 
ics will reach a different equilibrium that yields weakly smaller payoff. 

Note that convergence and incentive compatibility say that the protocol will 
eventually "stabilize" if implemented correctly, and that the participants are 
actually willing to do so. The class for which these features has been proved 
is given by games for which a Nash equilibrium is computed by iteratively 
eliminating "useless" strategies, called never best-response (NBR) strategies. In 
fact, Nisan et al [NSVZ11] showed that this class of games captures several 
protocols and mechanisms arising in computerized and economics settings: (1) 
the Border Gateway Protocol (BGP) currently used in the Internet, (2) a game- 
theoretic version of the TCP protocol, and (3) mechanisms for the classical 
cost-sharing and stable roommates problems studied in micro economics. 

In this work we address the following question: 

What happens if players do not always best respond? 

Is it possible that when players "occasionally" deviate from the prescribed pro- 
tocol (e.g., by making mistakes in computing their best-response) then the pro- 
tocol does not converge anymore? Can such mistakes induce some other player 
to adopt a "non-best-response" strategy that results in a better payoff? 

Our contribution This work investigates convergence and incentive compat- 
ibility conditions of the best-response dynamics/mechanisms in [NSVZ11] when 
players are not guaranteed to always best respond but they rather play an im- 
perfect best-response strategy. That is, at every time step every player deviates 
from the prescribed best-response strategy according to some probability pa- 
rameter p > 0. The parameter p can be regarded as the probability of making a 
"mistake" every time the player updates his /her strategy. We prove the following 
results: 

• Convergence. The convergence to the pure Nash equilibrium may not 
occur even for p being exponentially small in the number n of players 
(Theorem 6). Such negative result applies also to certain instances of the 
BGP games. This negative result is complemented by a general positive 
result saying that p needs to be polynomially small with respect to some 
parameters defining the schedule of the players (Theorem 8). This gives 
also a bound on the time needed to converge which just a bit more than 
the upper bound in [NSVZ11] for "perfect" best-response (p = 0). Note 
that in our setting we assume a reasonably weaker adversarial schedule 
(Definition 4). 

• Incentive compatibility. We show that this feature requires a slightly 
stronger condition than the one given in [NSVZ11] which takes into ac- 
count the parameter p and, essentially, the possibility that the other play- 
ers do not "completely" discard their NBR strategies. 
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• Generalized games and equilibria. We also consider a more general class 
of games in which the elimination of NBR strategies will only result in a 
subgame (and not necessarily in a unique strategy profile). In this case, 
when p is small enough, the dynamics is essentially the dynamics of the 
subgame and thus equilibrium of the subgame provides a good description 
of the equilibrium of the dynamics, regardless of the kind of equilibrium 
at which one is interested. Furthermore, the time to reach such an equi- 
librium is the sum the time needed to "converge to the subgame" plus the 
time needed to reach the equilibrium by the dynamics that runs on the 
subgame only (Theorem 15). 

These results indicate to what extent convergence and incentive compatibility 
depend on the assumption that players never make mistakes, and how robust 
such protocols are to "noise" or "mistakes". 

Further related work. Our imperfect best response dynamics are essentially 
equivalent to the mutation model by Kandori et al [KMR93] , and to the mistakes 
model by Young [You93] , and Kandori and Rob [KR95] . A related model is the 
logit response dynamics of Blume [Blu93] in which the probability of a mistake 
depends on payoffs of the game. The dynamics studied in these works are 
based on a specific schedule of the players (the order in which they play in the 
dynamics). Whether such an assumption effects the selected equilibrium is the 
main focus of a recent work by Alos- Ferrer and Netzer [AFN10]. 

2 Definitions 

Games. We consider an n-player game in which each player i has finite strat- 
egy set Si and utility function Ui. Sometimes we assume that each player has 
also a tie breaking rule -<i, i.e., a total order on Si, that depends solely on the 
player's private information: such tie-breaking rule can be implemented in a 
game by means of opportune perturbations of the utility function. Let us now 
recall some definitions from [NSVZ11]. 

Definition 1 (Never Best Response). A strategy Sj is a never best-response 
(NBR) for player i if, for every S—i, there exists such that Uj(si, S—i) < 
Ui(s^,s_i). In the case that a tie breaking rule -<i has been defined for player i, 
then Si is a NBR for i also if u<(sj, s_j) = Uj(s^,s_.,-) and Si -<i s^. 

Definition 2 (Elimination Sequence). An elimination sequence for a game G 
consists of a sequence of subgames 

G — Gq d Gi D • D G r = G , 

where any game Gk+i is obtained from the previous one by letting some player 
i^ k ' eliminating strategies which are NBR in Gk- 
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The length of the shortest elimination sequence for a game G is denoted 
with ec (we omit the subscript when it is clear from the context). It is easy 
to see that for each game ec < n(m — 1), where m is the maximum number of 
strategies of a player. Our results will focus on the following classes of games. 

Definition 3 (NBR- Reducible and NBR-Solvable Games). The game G is 
NBR-reducible to G if there exists an elimination sequence for G that ends in 
G. The game G is NBR-solvable if it is NBR-reducible to G and G consists of 
an unique profile. 

As an example, consider a 2-player game, in which each player has strategy 
set {0, 1, 2} and utilities as follows: 








1 


2 





0,0 


0,0 


0,-2 


1 


0,0 


-1,-1 


-1,-2 


2 


-2,0 


-2,-1 


-2,-2 



(1) 



Notice that strategy 2 is a NBR for both players. Hence, there exists an elim- 
ination sequence of length 1 that reduces above game in its "upper-left" 2x2 
subgame with strategy set {0, 1} for each player. Therefore, this game is NBR- 
reducible. If we add the tie-breaking rule "prefer strategies with smaller index", 
then the game reduces further to the profile (0, 0) and hence it is NBR-solvable. 



Dynamics. A dynamics is usually specified by two rules: a selection rule, that 
specifies for each time step the subset of players that are selected for updating 
their strategies; an update rule, that specifies how a player updates her strategy 
(possibly depending on the past history and on the current strategy profile) . In 
this work we focus on the following classes of selection and update rules. 

Definition 4 ((R, e)-Fair Selection Rule). A selection rule is (R, e)-fair if there 
exists a nonnegative integer R such that, for any interval of R time steps, all 
players are selected at least once in this interval with probability at least 1 — e. 

As an example, scheduling players in round-robin fashion or concurrently are 
obviously (i?, 0)-fair selection rules with R = n and R = 1, respectively, whereas 
selecting a player at random at each time step is (R, e)-fair with R= 0(n log n). 
Observe that if a selection rule is (R, e)- fair, then, for every 8 > 0, all players are 

selected at least once with probability at least 1 — S in an interval of R ■ }°g[^fj 

time steps (this holds because the probability 1 — e is guaranteed for any interval 
of R time steps) . We also denote with rj the maximum number of players selected 
for update in one step by the selection rule. Note that i] < n. 
As for the update rule, we give the following definition. 

Definition 5 (p-Imperfect Update Rule). In a p-imperfect update rule each 
player updates her strategy to a NBR with probability at most p. 
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As an example, best-response update rule is O-imperfect, whereas the logit 
update rule [Blu93] (see Appendix A for a brief overview) is p-imperfect with 

m — 1 

P - 1 i a 

m — 1 + e p 

for all games in which the payoff between a non-best and a best-response differ 
by at least one 1 and each player has at most m strategies. 

Henceforth, we always refer as imperfect best-response dynamics to any dy- 
namics whose selection rule is (R, e)-fair and whose update rule is p-imperfect. 
We highlight that we do not put any other constraint on the way the dynamics 
run. In particular we allow both the selection rule and the update rule to de- 
pend on the status of the game, that is on a set of informations other than the 
current strategy profile. 

3 NBR-solvable games 
3.1 A negative result 

In this section we will show that the result about convergence of the best- 
response dynamics in NBR-solvable games given in [NSVZ11] is not resistant to 
the introduction of "noise", i.e., there is a NBR-solvable games and an imperfect 
best-response dynamics that never converges to the Nash equilibrium even for 
values of p very small. Specifically we will prove the following theorem. 

Theorem 6. There exists a n-player NBR-solvable game G and an imperfect 
best-response dynamics with parameter p exponentially small in n such that, for 
every integer t > and for every < e < 1, the dynamics is in the Nash 
equilibrium of G after t steps with probability at most e. 

The game. Consider a NBR-solvable game with n players and two strategies 
and 1. The elimination sequence consists of players 1,2, ...,n eliminating 
strategy one-by-one in this order (note that 1 is a dominant strategy for 
player 1 and, more in general, strategy 1 is dominant for i in the subgame in 
which all players 1, . . . , i — 1 have eliminated 0). The subgame G consists of the 
unique PNE that is the profile 1 = (1, . . . , 1) . 

The p-imperfect update rule. All players play the following p-imperfect 
update rule: 

• Player i chooses strategy 1 with probability p if all players j < i are playing 
strategy 1; 

• Player i chooses strategy with probability 1 — q if at least one player 
j < i is playing strategy 0, where < q <C p. 

1 When the minimum difference is 6 this extends easily by taking (5g = ■ 6 in place of /3. 
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The (2™ 1 , 0)-fair selection rule. Let us start by defining sequences (7$, with 
i = 1, . . . , ro, recursively as follows 

Cl = 1, CT 2 = 12, CT 3 = 1213, ... <7j = <7i_l<7j_2 • • • o\i . 

Observe that each sequence has length 2 l_1 . The selection rule schedules players 
one at a time according to a n and then repeat. 

A key observation about this selection rule is in order. 

Observation 7. Between any two occurrences of player i < n there is an 
occurrence of some player j > i. 

Intuitively speaking, this property causes any bad move of some player in the 
sequence a n to propagate to the last player n, where by "bad move" we mean 
that at time t the corresponding player a n (t) plays strategy given that each 
player j < o~ n (t) plays 1 (thus, a bad move occurs with probability p). 

Proof of Theorem 6. Throughout the proof we will denote 2 n ~ 1 as r for 
sake of readability. 

Let X t be the random variable that represents the profile of the game at 
step t. We will denote with X™ the n-th coordinate of X t , i.e., the strategy 
played by player n at time t. Suppose that player n plays at the beginning. 
Then, for every t < r, the probability that at time step t the game is in a Nash 
equilibrium is obviously 0. Consider now t > r. The probability that at time 
step t the game is in a Nash equilibrium is obviously less than the probability 
that X™ = 1. Hence it will be sufficient to show that Pr (X™ = 1) < e. Note 
that X™ = X™ T , c being the largest integer such that t > c ■ r. Since both the 
update rule and the selection rule described above are memoryless, for every 
profile x 

Pr (X£ = 1 | X (C _ 1)T = x) = Pr (X? = 1 | X = x) . 

Let us use Pr x (X™ = 1) as a shorthand for Pr (X™ = 1 | X = x). Moreover, 
let B the event that no bad move occurs in the interval [1, r] and let B t denote 
the event that the first bad move occurs at time t G {1, . . . , r}. Then 

T 

Pr (X? = 1) = Pr (X™ = 1 | B) Pr (E) + ^ Pr (X' T l = 1 | B t ) Pr (B t ) . 

t=i 

Note that B t has probability at most p and B has probability (1 — p) T . 
Trivially, Pr (X" = 1 | Bj =1. Moreover, by Observation 7, given a bad move 
of player i ^= n at time U, there is a sequence of time steps U+i < U+2 < • • ■ < t n 
such that player j > i is selected at time tj and it is not selected further before 
tj+\. Therefore, player i + 1 plays at time with probability 1 — q because 
at that time i is still playing 0. Similarly, if player j at time is still playing 
0, then player j + 1 will play with probability 1 — q. Hence, 

Pr(X»^l|B t )>(l-g)». 



G 



Then 



Pr(X» = l) < (1-pY + T P {l-{l-qT) 

1 q 

< h pr - 



1 + pr 1 — q 

where we repeatedly used that 1 — x < e~ x < (1 + a;) . 

The theorem follows by taking p — £ and q sufficiently small. □ 

Remark. It is interesting that we can instantiate the abstract game above into 
the following instance of the BGP game [LSZ08, NSVZ11]: 



Ui(0,s_j) = 

1, if si = • • • = Si-i = 1; 
— L, otherwise; 



Ui(l,S-i) = 



where L is a large number. Similarly the update rule described above may be 
instantiate as a logit update rule with noise (3, that corresponds to set 

1 , e~P L 

V = „ and q = —r 



-0L 



Remark. Note that in the proof of Theorem 6 we considered p « and showed 
that the corresponding imperfect best-response dynamics does not converge. As 
a consequence, it may be possible to prove fast convergence to the equilibrium 
only by taking p being smaller than 1/R, as done in following section. 



3.2 Convergence Time 

Given the above negative result, we wonder whether there are values of p for 
which the convergence of the best-response dynamics is restored. The following 
theorem states that this occurs when p is small with respect to parameters R, rj 
and e. 

Theorem 8. For any NBR-solvable game G and any small S > an imperfect 
best-response dynamics converges to the Nash equilibrium of G in 0(R- eloge) 
steps with probability at least 1 — 5, whenever p < t)fl e c |oge , for an opportunely 
chosen constant c. 

The following two lemmas represent the main tools in the proof of the theo- 
rem above. Both lemmas hold for NBR-solvable games as for the more general 
class of NBR-reducible games. Moreover, in both lemmas we denote with X t 
the random variable that represents the profile of the game after t steps of an 
imperfect best-response dynamics. Note also that, for an event E we denote 
with Pr/j (E) the probability of the event E conditioned on the initial status 
being h. 
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Lemma 9. For any initial status h, we have 



Pr {X t+S eG k \X s e G fc ) > l-rtpt, (2) 

h 

Pr {X R+S G G k+1 | X s G G k ) > 1-npR-e. (3) 

h 

Proof. Let the dynamics be in Gk at time s and observe that if the dynamics 
is not in G k at time t + s, then in one of these time steps some selected player 
played a NBR. Since at every step at most r] players are selected, (2) follows 
from the union bound. 

Similarly, if the dynamics is not in Gk+i at time t + s given that player i^ 
has been selected for update at least once during the interval [s + 1, s + t], then 
in one of these time steps some selected player played a NBR. Hence, 

Pr (X t+S £ G k+1 | X s G G fc n SEL l(k) < r/tp , (4) 

where SELi tS ^ is the event that player i is selected at least once in the interval 
[s + 1, s + 1] . Now simply observe that 

Pr(X t+s ^G k+1 \X s eG k ) < Pr (X t+S g G k +i \ X s G G k n SEL l{k ) s t ) 

+ (l-Pr(SEL i( * w )) 
< ^+(l-Pr(^ WiM )), 

where the first inequality follows from to the definition of conditional probabil- 
ities and the last one uses (4). Since Pi(SEL i ( k ) s R ) > 1 — e by definition of 
imperfect best-response dynamics, the lemma follows. □ 

Lemma 10. For any initial status h and 1 < k < e, we have 
Py {X kR e G k ) > 1 - k ■ { VP R + e). 

h 

Proof. Observe that 

Pr (X kR i G k ) < Pr (X kR £ G k \ X {k _ 1)R G G {k _ l)R ) 

+ P r { X {k-1)R i- G(fe_i)i?) 

< rjpR + e + Pr (X {k _ 1)R £ G (fe _ 1)fl ) , 

where the first inequality follows from to the definition of conditional probabil- 
ities and the last one uses (3). Since Pr^ (Xq ^ Go) = the lemma follows by 
iterating the argument. □ 



log(2e/<5) 
log(l/e) 



As dis- 



Proof of Theorem 8. Consider an interval T of length R 
cussed above, the probability that all players are selected at least once in an 
interval of length T is The theorem follows by applying Lemma 10 with 
k = e, (R, e) = (T, S/2e) Lid p < | • □ 
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3.3 Incentive Compatibility 



Nisan et al. [NSVZ11] showed that eliminating a NBR is incentive compatible, 
i.e., no player benefits by playing at some time step a NBR, for a subclass of 
NBR-solvable games, namely NBR-solvable game with clear outcome. Here, a 
game is said to have clear outcome if, for every player i, there is a player-specific 
elimination sequence such that the following holds: If i appears the first time in 
this sequence at position k, then in the subgame Gk the profile that maximizes 
the utility of player i = i^ is the Nash equilibrium. 

In this section we ask if the incentive compatibility property holds also in 
presence of "noise". This means that we are wondering whether the only im- 
provement (if any) can occur by playing a "less imperfect" best-response dy- 
namics, i.e., one whose update rule is p'-imperfect, with p' < p (note that every 
p'-imperfect update rule with p' < p is also a p-imperfect update rule) . 



A Negative Answer. The following theorem shows that the incentive com- 
patibility property is not resistant to the introduction of noise. 

Theorem 11. There is a NBR-solvable game with clear outcome and an im- 
perfect best-response dynamics whose update rule is not incentive compatible. 

Proof. Consider the following game G with clear outcome (the "gray profile") 

left right 

top 
bottom 





c + 2,1 




1,0 


0,0 


0,c 



and suppose to run the logit dynamics for G (we already noted that the logit 
dynamics is an example of imperfect best-response dynamics). We will show 
that the column player has a better expected payoff by playing always strategy 
"right". 

The above game is a potential game and the potential $ is 



top 
bottom 



left right 



c + 2 


c+1 





c 



It is known (see, for example, [Blu93, AFPP10]) that in this case the logit dy- 
namics converges to a distribution on the set of profiles such that the probability 
of a profile x is proportional to e' 3 *^ . Hence, the expected utility of the column 
player when she plays according to the logit update rule is 



1 • e 



/3(c+2) 



+ c • e 



/3 C 



=2/3 



1 + e /3c _|_ e /3(c+l) _|_ e /3(c+2) 



< 



1 



=2/3 



(5) 



If instead the column player always plays strategy "right", then her expected 
payoff is determined by the logit dynamics on the corresponding subgame and 
it is equal to 

(6) 



=/3c . 



=/3(e 



1 
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Since the right-hand side of (5) is smaller than (6) for c > 1 + , the lemma 
follows. □ 



Sufficient Conditions. As done for convergence, we now investigate for suf- 
ficient conditions for incentive compatibility. We will assume that utilities are 
non-negative: note that there are a lot of update rules that are invariant with 
respect to the actual value of the utility function and thus, in these cases, this 
assumption is without loss of generality. Moreover when we say that player 
i = i( k ) we are assuming that Gk is the first subgame in which i is asked to 
eliminate a NBR strategy in her elimination sequence. 

It turns out that we need a "quantitative" version of the clear outcome prop- 
erty, i.e., that whenever the player i has to eliminate a NBR her utility in the 
Nash equilibrium is sufficiently larger than the utility of any other profile in the 
subgame she is actually playing. Specifically, we have the following theorem. 

Theorem 12. For any NBR-solvable game G and any small 8 > 0, playing 
according to a p-imperfect rule is incentive compatible for player i = as long 
as P — riR-e log e > f or an opportunely chosen constant c, the dynamics run for 
Q (R ■ eloge) and 



where m(NE) is the utility of i in the Nash equilibrium, u\ = max xeG (fc) Uj(x) 
and u* = max x gg Uj(x). 

We can summarize the intuition behind the proof of Theorem 12 as follows: 

• If player i always update according the p-imperfect update rule, then the 
game will be in the Nash equilibrium for a lot of time steps and hence her 
expected utility almost coincides with the Nash equilibrium utility; 

• Suppose, instead, player i does not update according a p-imperfect update 
rule. Notice that elimination of strategies up to Gk is not affected by what 
player i does. Therefore profiles of G \ Gk will be played only for a small 
number of times (but i can gain the highest possible utility from these 
profiles) , whereas for the rest of the time the game will be in a profile of 



Let us now formalize this idea. We start with the following lemma. 
Lemma 13. For any initial status h, any 1 < k < e and any t > kR, we have 
Pr (X t G Gk) >l — rjp-(t — IkR) - k ■ (rjpR + e), 

h 

where I is the largest integer such that t > IkR. 
Proof. We have 




Gk- 



Pr (X t i G k ) < Pr (X t $ G k \ X ekR € G k ) + Pr (X lkR £ G k ) . 
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From Lemma 9 we have 



Pr (X t i G k | X ekR G G fe ) <VP-(t- ikR). 

h 

Moreover, let ft' be the status that contains every information collected in the 
first (£ — l)kR steps of the dynamics. Then by Lemma 10 we have 

Pr (X ekR i G k ) = Pr (X kR $ G k ) < k ■ (r) P R + e). □ 

h h' 

Remark. Observe that that Lemma 13 holds even if only players . . . 
are updating according a p-imperfect update rule. 

Proof of Theorem 12. Let us start by computing the expected utility of i, given 
that all players are playing according to the p-imperfect update rule. Let T and 
p as in the proof of Theorem 8. Then, by applying Lemma 13 with k — e and 
(R, e) = (T, S/2e) we have for any t = ft (R ■ e log e) 



> 1 - 26 . 



Hence, the expected utility of i will be at least (1 — 26) ■ Ui(NE). 

Suppose now that i does not play a p-imperfect update rule. Similarly as 



done above, we let T = R ■ 
(R,e) = (T,6/2k) we obtain 



log(l/e) 



and then, by applying Lemma 13 with 



Pr(X t $ G k ) < 26. 

h 



Hence, the expected utility of i will be at most 25u* + and the theorem 
follows. □ 



4 NBR-reducible games 

In the previous section we focused on NBR-solvable games and pure Nash equi- 
libria. Now, we will see that some of the ideas developed there can be extended 
in order to handle NBR-reducible games and more generic equilibrium concepts. 
In particular, we will see that for a wide class of equilibrium concepts, the con- 
vergence of an imperfect best-response dynamics for a NBR-reducible game G 
can be analyzed by considering a restriction of this dynamics to the reduced 
game G. 

The Dynamics as a Markov Chain. Let us start by introducing some 
useful notation. We say that the game is in a pair status-profile (ft, x) if s the 
set of informations available and x is the profile currently played. We denote 
with H the set of all pairs status-profile (ft, x) and with H only the ones with 
x G G. Let X t be the random variable that represents the pair status-profile 
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(h, x) in which the game is after t steps of the original dynamics. Then, for 
every (h,x), (z, y) £ H we set 



P((/ l , x), (z, y)) = Pr (X x = (z, y) | X = (fc, x 

That is, P is the transition matrix of a Markov chain on state space H and it 
describes exactly the evolution of the dynamics. Note that we are not restricting 
the dynamics to be memory less, since in the status we can save the history 
of all previous iterations. For a set A C H we also denote P((h,x), A) — 
T, iz>y) eA P ((^),(z,y)). 

The Restricted Dynamics. As told before, we will compare the original 
dynamics with a specific restriction on the subset H of pairs status-profile. 
Now we describe how this restriction is obtained. Henceforth, when we will 
refer to the restricted dynamics, we will use X t and P in place of X t and P. 
Then, the restricted dynamics is described by a Markov chain on state space H 
with transition matrix P such that for every (ft.,x), (z,y) £ H 

P((fc,x),(* >y )) = IniH^H) ■ f^xMz.yJeff, 
10, otherwise. 

Thus, the restricted dynamics is exactly the same as the original one except 
that the first never leaves the subgame G, whereas in the latter, at each time 
step, there is probability at most p to leave this subgame. The following lemma 
quantifies this similarity, by showing that, for every (/i,x) £ H, the total vari- 
ation distance (TV) 2 between the original and the restricted dynamics starting 
from (h, x) is small. 

Lemma 14. For every (h, x) £ H, 

P*((/i,x),.)-P*((/i,x),.)|| < V pt. (7) 

Proof. The proof is by induction on t. The base case is t = 1 for which the 
set of pairs status-profile (z,y) such that P((h, x), (z,y)) > P((h, x), (z, y)) is 
exactly H = H \ H and hence 

P((h,x),-)-P((h,x),-)\\ = ^_(P((/ I ,x),(z, y ))-P((^,x),(z,y)) 

= P((h,x),H) <r]p, 



"See Appendix B for a review of the main properties of the total variation distance. 
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where the last inequality follows from Lemma 9. Furthermore 



P'^x),)-^^^,.) 



< 



(TV triangle inequality) < P((/i,x), ^P* -1 - P((h,x), ^P 4 " 1 



+ 



P((/ l ,x),-)P*- 1 -P((^x),-)P 
(TV monotonicity) < P((h,x), •) - P((/i,x), •) 

+ sup P'-^y^O-P'-^y),-) 
(induction and Lemma 9) < np + — 1) = ?7pt . 



□ 



Equilibria and Convergence. Several and different equilibrium concepts 
of independent interest has been introduced, as. for example, sink equilib- 
ria [GMV05], correlated equilibria [Aum74] and logit equilibria [AFPP10]. We 
would like to give a generic result that holds for each one of these equilibria 
and for any other equilibrium concept at which one may be interested. For this 
reason in the following we will consider a generic equilibrium, that is either a 
set of pairs status-profile or a distribution on these pairs. Note that each one 
of the equilibria above described are included in this definition. However, the 
definition includes also equilibrium concepts like "the first profile that is visited 
for 10 times" or "the first cycle of length 4 visited". We remark that in this case 
it is critical that the equilibrium is defined on the pairs status-profile and not 
just on the profiles: indeed, the status can remember the history of the game 
and identify such equilibria, whereas they are impossible to recognize if we only 
know the current profile. 

At the twofold definition of a generic equilibrium corresponds a twofold 
meaning of convergence time. Indeed, if the equilibrium is represented by a 
set of pairs status-profile, then we are interested in the first time step in which 
the game has reached this set. In the case that the equilibrium is given by a 
distribution, then we are interested in the first time step in which this equilib- 
rium distribution is close to the the distribution on the set of profiles generated 
by the dynamics. 



The Main Theorem. Let us denote with r the time the restricted dynam- 
ics takes to converge to a generic equilibrium E. Then we have the following 
theorem. 

Theorem 15. For an NBR-reducible game and any small 5 > an imperfect 
best-response dynamics converges to E in 0(R ■ eloge + r) steps with proba- 
bility at least 1 — 5, whenever p < min l ^^f^^ i ^p\> f or opportunely chosen 
constants C\ , C2 . 

Proof. We will show that the dynamics will be in H after 0(R ■ eloge) with 
probability at least 1 — 5/2; moreover, if the dynamics is in H after a number t of 
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steps, then it will converge to the equilibrium in further t steps with probability 
at least 1 — 8/2. Hence, the probability that the dynamics does not converges 
in 0{R ■ eloge + r) steps will be at most 8 and the theorem follows. 

Specifically, consider an interval T of length R ■ j^r03 ■ By applying 

Lemma 10 with k = e, (i?, e) = (T, 8/Ae) and p < | • we have that for every 

(A,x)6H 

Pr (x eT G H | X = (h, x)) > 1 - 5/2 . 



Finally, note that the probability that, for every t > 0, the dynamics con- 
verges to the equilibrium in t + r steps given that after t steps it is in [z, y) G H, 
is the same as if we assume the dynamics starts in (z, y), i.e., it is equivalent to 
the probability that the dynamics converges to the equilibrium in r steps from 
(z,y). If the equilibrium concept at which the restricted dynamics converges 
after r steps is a distribution n on the pairs status-profile, then, from (7), the 
distribution after r steps of the original dynamics is it except for an amount 
of probability of at most rjpr. On the other side, if the equilibrium concept 
at which the restricted dynamics converges after t steps is a set A of pairs 
status-profile, then, from (7) we have 



Pr (x T e A) > Pr (x T £A\-fxpr = l 



HPT. 



and hence, after r steps, the original dynamics is in A except with probability 
at most hpt. Then, by Lemma 14 and by taking p < § ■ ^p> the probability 
that the original dynamics converges to the equilibrium in t steps starting from 
(z, y) is at least 1 — 8/2. □ 



Examples. Here we give several examples in which we adopt Theorem 15 
to bound the rate of convergence of an imperfect best-response dynamics to 
different kind of equilibria. Specifically, consider a NBR-reducible game G, as 
for example the one described in (1), and consider an imperfect best-response 
dynamics, as for example the logit dynamics. Suppose we are interested in 
evaluating the time the dynamics takes to reach a sink equilibrium: note that, 
since all profiles not in G contain iteratively dominated strategies, the sink 
equilibria of G are exactly the sink equilibria of G and then, by Theorem 15, it 
is sufficient to analyze what happens in this subgame. 

Suppose instead that we are interested in the time that the dynamics takes 
before the distribution over the profile generated by the dynamics is close to 
the one generated by a correlated equilibrium. As before profiles not in G, since 
they contain iteratively dominated strategies, do not appear in the support of 
any correlated equilibria. Then, again, by Theorem 15, we can simply analyze 
what happen in G. 

For another interesting example, suppose we are wondering about the con- 
vergence to the logit equilibrium. Note that, differently from what happen for 
correlated equilibria, the logit equilibrium assigns non-zero probability to pro- 
files not in G. However, it is not difficult to show (see Appendix C) that the 
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logit equilibrium of G is very close to the logit equilibrium of G. Hence, even 
in this case, by Theorem 15 bounds on the convergence can be easily given by 
focusing on G. 

Finally, if we consider equilibria like "the first profile that is visited for 10 
times", then the time that the dynamics takes to converge to these equilibria 
is obviously less than if we restrict the profile to being in G and, hence, by 
Theorem 15, it is sufficient to analyze the restricted dynamics. 
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A Logit Dynamics 

The logit dynamics for a game G runs as follows: at every time step (i) Select 
one player i £ [n] uniformly at random; (ii) Update the strategy of player i 
according to the Boltzmann distribution with parameter /3 over the set Si of her 
strategies. That is, a strategy Sj € Si will be selected with probability 

* i (s i \x- i ) = ^^e f3ui{ *- i ' Si) , (8) 

where itj is the utility function of the player i, x_j € {0, is the profile of 
strategies played at the current time step by players different from i, Zi(x—i) = 
Y^z-es- e^ Ui ^ x — i ' z ' i ' is the normalizing factor, and j3 > 0. From (8), it is easy to 
see that for (3 — player i selects her strategy uniformly at random, for f3 > 
the probability is biased toward strategies promising higher payoffs, and for j3 
that goes to oo player i chooses her best response strategy (if more than one 
best response is available, she chooses one of them uniformly at random). 

The above dynamics defines a Markov chain {X t }t>o with the set of strategy 
profiles as state space, and where the probability P(x, y) of a transition from 
profile x = (xi, . . . ,x n ) to profile y = (yi, . . . ,y n ) is zero if i?(x,y) > 2 and it 
is ^o~i(yi | x_j) if the two profiles differ exactly at player i. More formally, we 
can define the logit dynamics as follows. 

Definition 16 (Logit dynamics [Blu93]). Let G be a game and let /3 > 0. The 
logit dynamics for G is the Markov chain Mp = ({X t } t >o, S, P) where S is the 
set of profiles of G and 

(o-iiyt | x_i), ify-i = x_ s ; and yi ^ xf, 

P(x,y) = -. ^ElU^lx-i), */y = x; (9) 
n i 

10, otherwise; 

where (Ji(yi \ X_ j) is defined in (8). 

The Markov chain defined by (9) is ergodic. Hence, from every initial profile 
x the distribution P*(x, •) over profiles after the chain has taken t steps starting 
from x will eventually converge to a stationary distribution n as t tends to 
infinity. As in [AFPP10], we call the stationary distribution it of the Markov 
chain defined by the logit dynamics on a game G, the logit equilibrium of G. 

For the class of potential games the stationary distribution is the well-known 
Gibbs measure. 

Theorem 17 ([Blu93]). If G is a potential game with potential function $, then 
the stationary distribution tt of the Markov chain given by (9) is 

*(x) = §e~ Wx) 
, where Z = ^2 yeS e~^*' y ' is the normalizing constant. 
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B Total Variation Distance 



The total variation distance between distributions and /t on an enumerable 
state space Q is 



I/' 



-All : = l^2\K x ) - K x )\ = E mOO-AW- 



xen 
H(x)>p,{x) 



Note that the total variation distance satisfies the usual triangle inequality of 
distance measures, i.e., 

||/i - All < IIm- m'II + \W - All • 

for every distribution jj! . Moreover, the following monotonicity properties hold: 



ll^-A^II < Ha* -All, 



^p-^p 



< sup 

xen 



P(x,-)-P(x,-) 



Wl*P-fiP\\ < sup \\P{x,-)-P(y,-)\\, 
where P and P are stochastic matrices. Indeed, as for (10) we have 



(10) 

(11) 

(12) 



||/iP-£P|| = ||(M-A)^|| 



xen 



yen 



< 



^|mOz)-A(z)|^p(*,</) 
Wv- All • 



As for (11) we observing that 



/iP - \iP 



H{P - P) 



2 ^ 



yen 



^ E^) \lJ2\ p ( x ^- p ( x M] 

xen \ yen J 



< sup \\P(x, ■) — P(x, ■ 
xen II 



Finally, for (12) we have 

H/iP-APH = 



E^wEan (p(z,-)-p(w,-)) 

< ^m^)Eahii^,-)-pk-)ii 

< sup ||P( s ,.)-P(y,-)ll- 
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C Logit Equilibria and NBR-Reducible Games 



Let G be a game NBR-reducible to G. Let tt be the stationary distributions of 
the logit dynamics for G and tt be the stationary distribution of the restriction 
of this dynamics to G. Then the following lemma holds for f3 large enough. 



Lemma 18. For every 5 > 0. 



for (3 sufficiently large. 



-n<s. 



Proof. Let r = t m i x (5/8) be the mixing time of the restricted chain. Consider 
first two copies of the chain starting in profiles x, y G G and bound the total 
variation after r time steps: 

||P T (x,-)-P r (y,-)ll < P T (x,-)-P T (x,-)| + |p T (x,-)-^ 

7r-P T (y,-)|| + ||p r (y,-)-P r (y,-) 

< 4^ = f/2, 

where the last inequality is due to Lemma 14 by taking ft sufficiently large. 



Consider an interval T of length R 



log(8e/<S) 
log(l/e) 



By applying Lemma 10 with 



k = e, (P, e) = (T, 6/Ae) and /3 sufficiently large we have that for every x £ G 

Pr [x eT £ G) < 5/8 . 
Let t* = eT + t and Q = P eT . Then, for every x, y E G 



tt-P* (y, 



< 



P* (x, ■) - P ( *(y, •) = ||Q(x, -)P T - Q(y, -)^ T H 
!li-iau»lo inequality) Q(x, -)P T - Q(x, -)P T II + Q(x, -)P r - Q(y, -)P T 

Q(y,-)P T -Q(y,-)P r 

where, for every x, y £ G, we set 

0, otherwise. 



By (10) we obtain 

Q(x, -)P r - Q(x, -)^ 7 
By (12) we obtain 



< 



Q(x, •) - Q(x, .)|| < Pr (x eT £ G) < 5/8. 



Q(x, -)P r - Q(y, -)^ r < max ||P^(x, •) - P T (y, -)|| < 5/2 . 

x.yGG 
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and hence 1 1 tt — P* (y, -)|| < 3<5/4. Finally, for every x e G, by triangle inequal- 
ity 



\\tt — Tt\\ < 



tt-P^x,-) + P*"(x,.)-P r (x,-) + P r (x,.)-7r 



< 3(5/4 + <5/8 + = S. 



□ 



19 



